[ROCm] Fix allreduce + RMSNorm fusion pattern matchin by rbrugaro-amd · Pull Request #41767 · vllm-project/vllm

rbrugaro-amd · 2026-05-06T00:02:22Z

Summary

Fixes two issues that broke the allreduce + RMSNorm fusion pass introduced in #37646, caused by subsequent refactoring in #36823

torch.empty_like → torch.zeros_like in AiterAllreduceFusedRMSNormPattern._replacement (allreduce_rms_fusion.py):
The fused allreduce+rmsnorm kernel always adds res_inp; using empty_like leaves undefined values that corrupt outputs. Changed to zeros_like so the add is a no-op when residual is freshly created.
Conditional variance_size_override argument in RMSNorm.forward_native (layernorm.py):
After the IR refactoring, ir.ops.rms_norm and ir.ops.fused_add_rms_norm.maybe_inplace were unconditionally passed self.variance_size_override (even when None). This produced 4-argument calls in the FX graph, but the fusion patterns expect 3 arguments. The mismatch prevented pattern matching entirely. Fixed by conditionally unpacking variance_size_override only when it is not None.

Testing

Tested with Kimi-K2-Thinking-MXFP4 on 4x MI355X (TP=4)

vllm: 0.20.1rc1.dev153+gcfd2573f2 (base commit cfd2573f2)
aiter: amd-aiter 0.1.12.post2.dev126+g033d8b9db
torch: 2.10.0+git8514f05

Fusion pass results (confirmed via VLLM_DEBUG_DUMP_PATH graph dumps and custom logging):

all_reduce_fusion_pass: 244 pattern matches across 2 compile ranges (122 per range)
mla_dual_rms_norm_fusion_pass: 183 matches
Graph reduced from ~4491 → ~4367 nodes after fusion/cleanup passes
fused_add_rms_norm=aiter implementation selected

Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>

gemini-code-assist

Code Review

This pull request updates the allreduce_rms_fusion pass to initialize the residual tensor with zeros instead of uninitialized memory. Additionally, it modifies the RMSNorm forward pass in layernorm.py to conditionally pass the variance_size_override argument only when it is not None. I have no feedback to provide as there were no review comments.

attila-dusnoki-htec · 2026-05-06T20:09:03Z

I tested this with 577b9623e6f8801698d411f4b04269326f5afbe2 base commit + PR#40392 change with kimi-k2.5-mxfp4 model.

Without this patch and "fuse_allreduce_rms": false:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9554|±  |0.052|
|     |       |strict-match    |     5|exact_match|↑  |0.9417|±  |0.0067|

Without this patch and "fuse_allreduce_rms": true:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.008|±  |0.0056|
|     |       |strict-match    |     5|exact_match|↑  |0.004|±  |0.0040|

With this patch and "fuse_allreduce_rms": true:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.952|±  |0.0135|
|     |       |strict-match    |     5|exact_match|↑  |0.952|±  |0.0135|

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

ProExpertProg · 2026-05-07T17:31:08Z

                self.weight.data if self.pass_weight else None,
                self.variance_epsilon,
-                self.variance_size_override,
+                *(


Let's just fix the patterns in the pass instead

premit fixes

f73628c

Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>

rbrugaro-amd changed the title ~~[ROCm] Fix allreduce + RMSNorm fusion pattern matchin~~ [WIP][ROCm] Fix allreduce + RMSNorm fusion pattern matchin May 6, 2026

mergify Bot added the rocm Related to AMD ROCm label May 6, 2026

github-project-automation Bot added this to AMD May 6, 2026

github-project-automation Bot moved this to Todo in AMD May 6, 2026

gemini-code-assist Bot reviewed May 6, 2026

View reviewed changes

Merge branch 'main' into rbrugaro/fix-allreduce-rms-fusion

082f324

rbrugaro-amd marked this pull request as ready for review May 7, 2026 15:50

rbrugaro-amd requested review from BoyuanFeng, ProExpertProg, vadiklyutiy, youkaichao and zou3519 as code owners May 7, 2026 15:50

claude Bot reviewed May 7, 2026

View reviewed changes

rbrugaro-amd mentioned this pull request May 7, 2026

[ROCm] Fix AITER AR+RMSNorm no-residual fusion #41972

Merged

Merge branch 'main' into rbrugaro/fix-allreduce-rms-fusion

3a05a16

ProExpertProg reviewed May 7, 2026

View reviewed changes

rbrugaro-amd changed the title ~~[WIP][ROCm] Fix allreduce + RMSNorm fusion pattern matchin~~ [ROCm] Fix allreduce + RMSNorm fusion pattern matchin May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Fix allreduce + RMSNorm fusion pattern matchin#41767

[ROCm] Fix allreduce + RMSNorm fusion pattern matchin#41767
rbrugaro-amd wants to merge 3 commits intovllm-project:mainfrom
rbrugaro-amd:rbrugaro/fix-allreduce-rms-fusion

rbrugaro-amd commented May 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

attila-dusnoki-htec commented May 6, 2026

Uh oh!

claude Bot left a comment

Uh oh!

ProExpertProg May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

rbrugaro-amd commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

attila-dusnoki-htec commented May 6, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

ProExpertProg May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rbrugaro-amd commented May 6, 2026 •

edited

Loading

ProExpertProg May 7, 2026 •

edited

Loading